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ABSTRACT 

We look at the problem of merging direct and remotely sensed (indirect) data with fore- 
cast data to get an estimate of the present state of the atmosphere, for the purpose of numer- 
ical weather prediction. To carry out this merging optimally, it is necessary to provide an esti- 
mate of the relative weights to be given to the observations and forecast. It is possible to do 
this dynamically from the information to be merged, if the correlation structure of the errors 
from the various sources is sufficiently different. We describe some new statistical approaches 
to doing this and quantify conditions under which such estimates are likely to be good. 

1. INTRODUCTION 

We have been studying various aspects of the problem of simultaneously combining 
information from various sources, for the purpose of obtaining initial conditions of the atmo- 
sphere for numerical weather prediction. By information, we mean data from diverse instru- 
ments, information from a forecast, prior information concerning the atmosphere, and physi- 
cal constraints. See Wahba(1981, 1982a, 1985a). Various parts of this what might be called 
"multispectral” point of view, whereby data from different sources is combined, and several 
meteorological parameters retrieved simultaneously, is a major theme in several papers 
presented at this conference, in particular, see Isaacs et al. (1988), Smith et al.(1988), West- 
water et al. (1988) and Rodgers (1988). See also Lorenc(1986). 

In this paper we will first briefly review the variational prescription for combining data 
from different sources and prior information, and its relation to Gandin (Bayes) estimation. 
In this prescription are a number of "tuning" parameters, which we will divide into two 
classes. The first class will be called weighting parameters, and the second smoothing (also 
known as bandwidth) parameters. The weighting parameters are those which govern the 
relative weights to be given to various types of observational, forecast, and physical informa- 
tion, while the smoothing parameters control the relative amount of "information" which is to 
be assigned to "signal" and to "noise". All practical forecast models have many such tuning 
parameters, and in practice, they are chosen by trial and error, by the use of externally meas- 
ured data on various sources of error and strength of signal, and in very simple cases, by Kal- 
man filtering, which propagates estimates of covariances forward in the model. 
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it may model the relationship between directly measured quantities such as the horizontal 
wind field, and state variables in the forecast model such as the coefficients in a spherical har- 
monic expansion of stream function and velocity potential 

We will suppose that a reasonable model for the state variables (the mean having been 
subtracted off), is 

Es - 0, Ess ' = 6E 

See Wahba( 1982b), for further references and a discussion as to how E may be obtained from 
historical data. 

This approach results in a mandate to find s as the minimizer of the variational problem: 
(yO) - J p( 1 )(5))j2i 1 (y (1) -F w (s))+ y(y (2) -/ r(2) (s))C 2 1 (y (2) -^ (2) CO) 

+ Ay T T's , 

2 a i 

where r = ~4 and A=- 1 . See Wahba (1981, 1982a, 1985a), O’Sullivan and Wahba(1985). 

a\ b 

We note that physical constraints on the state vector s or functions of s can be inserted by 
solving the variational problem above subject to these constraints, see Villalobos and 
Wahba(1982), Svensson(1985). Svensson put constraints on the dry adiabatic lapse rate when 
solving a variational problem of this form with satellite radiance data to estimate vertical 
temperature profiles. 

In the linear Normally distributed case the estimate of s which minimizes the above vari- 
ational problem is the Gandin(Bayes) estimate of s, given the prior covariance of s and the 
covariance matrices of the errors. See, e. g. Kimeldorf and Wahba(1970). Kalman filter 
theory would give s as the minimizer of the variational problem with the term preceeded by A 
absent. Here the inclusion of this term enters apriori (smoothness) information that may 
have been lost via model error. 

If r, Qi and Qz are known, then A and some parameters inside E may be estimated by 
the GCV See, for example, Wahba and Wendelberger(1980), Wahba( 1982b), O’Sullivan and 
Wahba(1985), Merz(1980). 

3. DYNAMIC ESTIMATION OF WEIGHTING PARAMETERS 

3.1 A SIMPLE EXAMPLE - 500 mb RAOBS AND FORECAST 

We first illustrate the method and results by letting y (1) be observed 500mb heights and 
y (2) be forecast 500mb heights. It is reasonable to take Q i to be I for the observations since 
these measurements can be assumed to be independent, with about the same variance from 
station to station. Hollingsworth and Lonnberg(1986) and Lonnberg and Hollingsworth(1986) 
(LH), have recently obtained estimates of r and Q/=Qz from three months data from the 
European Center forecast model. We will use their example and results as an illustration, 
before going on to the general case. 
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Figure 1 shows an estimated 500 mb correlation function from LH. Let hf and h{ be the 
observation and the forecast 500 mb height at station /, and ef and ef be the observation and 
forecast errors, (on a particular day). Let 

Cl * hf - h{ = ef -e{ 

Here, y^-h° andy (2) =/i^, thus both vectors represent the same meteorological quantity, 
that condition will be relaxed later. Letting r/ m be the distance between stations / and m, LH 
assumed that 

Eefef, = GfPiTim ) 

where p(0) =1. If the ef are independent and identically distributed zero mean random vari- 
ables then 


ECiim ~ o\f>i m + o}p(j lm ) 

where 6 lm =1 if / =m and 0 otherwise. LH collected 90 days of values of Ci for each station. 
Letting j index day, they used sample correlations computed from 


1 


90 


&(/>£■(/) 


}p(j) 


as an estimate of o\S, m +o}p(Ti m ). In the figure, sample values of 2 L plotted, 

o\+Ofp(y>) 


and 


A 


Of+ol 


is estimated by extrapolating the smooth part of the curve back to the origin by 


methods described in their paper (quite different than the methods to be discussed here.) 
Figure 2 shows a one parameter family of (synthetic) correlation functions, defined by 


pl(t) = 

where 0(L) is determined by 


(1-2*L)c<k(|^)+i?(L))-^.(i + ^z.))-i 

<l-«(L))- 1 -(l+*(i.))- 1 

= cos^. 


Here the parameter L in km is the distance r for which /)L( r ) = yPL(0) = where Rq is the 


circumference of the earth, in km. This family of (isotropic) correlation functions on the 
sphere has been chosen here partly because of a superficial resemblance to some of the 
curves obtained by LH and partly for mathematical convenience. We wish to use this family 
as a moderately realistic example of the estimation of a single (important) parameter in Qf, 
namely, the correlation half-distance. Further study is needed to determine if it is appropriate 
to include other factors, such as anisotropy, variation with latitude, etc. in this correlation 
function. With this model, letting £ = (£i» . . . ,£„) be a vector of one day’s data, we have that 
the covariance matrix of £ is a\{I + rQf(L)), where the Imih entry of Q/(L) is pl( t /w )• The 
GCV estimate of r and L can be shown to be the values of r and L which minimize 
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We now consider the general (linear) case, where 

yO) = p0) s + 

where is of dimension n (, \i = 1,2 and the covariance of is a}Qi. To estimate r, we 
must be able to construct two matrices and B ^ of dimension nxn^ and nxn ^ 2) respec- 
tively, which satisfy =B^F^ 2 K Let z ( '* be defined by 

z (o = £(«yo / =1,2 

and let w be defined by 

w =Z (D. Z ( 2) =j g(i) y (i).5(2) y (2) = B (i) £ (i). B {i) £ (2) 

The covariance matrix of w is then 

Ew'w = a\B^ Q iB^ ' + a 2 B^ Q 2 B^ 

Suppose is of full rank, then we can take the Cholesky decomposition LL' of 

B w QiB (1 \ where L is lower triangular, and let £ =L‘ 1 w. Then the covariance matrix of £ is 

E(V = a\(I+rQ), 

where Q =L' l B^Q 2 B^L A The ML and GCV estimates are then given by the minimiz- 
ers of V and M of Section 3.1, and the estimability of r depends on the properties of Q. 
Loosely speaking, the two vectors and z® need have to have their "energy" at different 
"wavenumbers". 

3.4 COMPUTATIONAL CONSIDERATIONS 

Computation of r and L minimizing quantities similar to M and V has been carried out 
in a relatively straightforward way using matrix decompositions for n up to a few hundred, on 
a VAX 11/720 in the Statistics Dept, at the University of Wisconsin- Madison, and with n up 
to about 800 on the Cray XMP at the San Diego Super Computer Center, without much 
optimization of the code, in under 150 seconds, using GCVPACK (Bates et al. (1985)). 
Research is continuing on more efficient methods for larger problems. It is probably true that 
large data sets will be required in practice. 

3.5. POTENTIAL APPLICATIONS 

A possible important application is to the comparison of satellite observed radiances to 
forecast. It is possible to observe certain gross features of the atmosphere in two dimensional 
plots of satellite (raw) radiances. Forecast errors frequently tend to be phase errors, with 
certain spatial features displaced in space. To compare forecast y (2 * with satellite radiance 
datay (1) following the approach in this paper, one should compute from the forecast, radi- 
ances that would be seen by the satellite, that is, let B^ be I and B ^ map forecast into radi- 
ance data as would be seen by the satellite. Assuming that realistic Q\ and Q 2 can be esta- 
blished (remember, Q 2 is in the radiance observational domain), it is likely that r could be 
estimated, and used to help decide wether to trust the forecast or the radiance data in the 
event of a major discrepancy. 
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